Accounting for the Relative Importance of Objects in Image Retrieval

نویسندگان

  • Sung Ju Hwang
  • Kristen Grauman
چکیده

Images tagged with human-provided keywords are a valuable source of data, and are increasingly available thanks to community photo sharing sites such as Flickr and various labeling projects in the vision community. Often the keywords reflect the objects and events of significance and can thus be exploited as a loose form of labels and context. Researchers have explored a variety ways to leverage images with associated texts, including learning the correspondence between them for auto-annotation of regions, objects, and scenes, and building richer image representations based on the two simultaneous "views" for retrieval. Existing approaches largely assume that image tags’ value is purely in indicating the presence of certain objects. However, this ignores the relative importance of different objects composing a scene, and the impact that this importance can have on a user’s perception of relevance. For example, if a system were to auto-tag the bottom right image in Figure 1(c) with either ‘mud’ or ‘fence’ or ‘pole’ or ‘cow’, not all responses are equally useful. Arguably, it is more critical to name those objects that appear more prominent or best define the scene (say, ‘cow’ in this example). Likewise, in image retrieval, the system should prefer to retrieve images that are similar not only in terms of their total object composition, but also in terms of those objects’ relative importance to the scene. How can we learn the relative importance of objects and use this knowledge to improve image retrieval? Our approach rests on the assumption that humans name the most prominent or interesting items first when asked to summarize an image. Thus, rather than treating tags simply as a set of names, we consider them as an ordered list. Specifically, we record a tag-list’s nouns, their absolute ordering, and their relative rank compared to their typical placement. We propose an unsupervised approach based on Kernel Canonical Correlation Analysis (KCCA) to discover a “semantic space" that captures the relationship between those tag cues and the image content itself, and show how it can be used to more effectively process novel queries. The three tag cues are defined as follows: Word Frequency is a traditional bag-of-words that records the presence and count of each object. Each tag-list is mapped to an V -dimensional vector W = [w1, . . . ,wV ], where wi is the number of times the i-th word is mentioned, and V is the vocabulary size. This feature serves to help learn the connection between the low-level image features and the objects they refer to. Relative Tag Rank encodes the relative rank of each word compared to its typical rank: R = [r1, . . . ,rV ], where ri is the percentile of the i-th word’s rank relative to all its previous ranks observed in the training data. This feature captures the order of mention, which hints at the relative importance. Absolute Tag Rank encodes the absolute rank of each word: A = [ 1 log2 (1+a1) , . . . , 1 log2 (1+aV ) ], where ai is the average absolute rank of the i-th word in the tag-list. In contrast to the relative rank, this feature more directly captures the importance of each object in the same scene. For the image features, we use a diverse set of standard descriptors: Gist, color histograms, and bag-of-visual-words (BOW) histograms. To leverage the extracted features to improve image retrieval, we use KCCA to construct a common representation (or semantic space) for both 0 20 40 60 80 100 0.25 0.3 0.35 0.4 0.45 Object counts and scales (PASCAL)

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

HWANG AND GRAUMAN: ACCOUNTING FOR IMPORTANCE IN IMAGE RETRIEVAL 1 Accounting for the Relative Importance of Objects in Image Retrieval

We introduce a method for image retrieval that leverages the implicit information about object importance conveyed by the list of keyword tags a person supplies for an image. We propose an unsupervised learning procedure based on Kernel Canonical Correlation Analysis that discovers the relationship between how humans tag images (e.g., the order in which words are mentioned) and the relative imp...

متن کامل

بازیابی اطلاعات تصویری حوزه‌ی سلامت در وب از دیدگاه متخصصان علوم پزشکی:یک مطالعه کیفی

Introduction: The medical image as a source of non-textual information has an important role in the field of medicine. Since the quality of life is directly related to health, employing this type of information is effective in improving the practice of health professionals. This study was aimed to survey medical image retrieval in the Web from the perspective of experts in medical sciences. M...

متن کامل

Semiautomatic Image Retrieval Using the High Level Semantic Labels

Content-based image retrieval and text-based image retrieval are two fundamental approaches in the field of image retrieval. The challenges related to each of these approaches, guide the researchers to use combining approaches and semi-automatic retrieval using the user interaction in the retrieval cycle. Hence, in this paper, an image retrieval system is introduced that provided two kind of qu...

متن کامل

Performance Evaluation of Medical Image Retrieval Systems Based on a Systematic Review of the Current Literature

Background and Aim: Image, as a kind of information vehicle which can convey a large volume of information, is important especially in medicine field. Existence of different attributes of image features and various search algorithms in medical image retrieval systems and lack of an authority to evaluate the quality of retrieval systems, make a systematic review in medical image retrieval system...

متن کامل

Image retrieval using the combination of text-based and content-based algorithms

Image retrieval is an important research field which has received great attention in the last decades. In this paper, we present an approach for the image retrieval based on the combination of text-based and content-based features. For text-based features, keywords and for content-based features, color and texture features have been used. Query in this system contains some keywords and an input...

متن کامل

A Novel Method for Content Base Image Retrieval Using Combination of Local and Global Features

Content-based image retrieval (CBIR) has been an active research topic in the last decade. In this paper we proposed an image retrieval method using global and local features. Firstly, for local features extraction, SURF algorithm produces a set of interest points for each image and a set of 64-dimensional descriptors for each interest points and then to use Bag of Visual Words model, a cluster...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010